Goto

Collaborating Authors

 train ai model


Are tech companies using your private data to train AI models?

Al Jazeera

Are tech companies using your private data to train AI models? Leading tech companies are in a race to release and improve artificial intelligence (AI) products, leaving users in the United States to puzzle out how much of their personal data could be extracted to train AI tools. Meta (which owns Facebook, Instagram, Threads and WhatsApp), Google and LinkedIn have all rolled out AI app features that have the capacity to draw on users' public profiles or emails. Google and LinkedIn offer users ways to opt out of the AI features, while Meta's AI tool provides no means for its users to say "no, thanks." Anthropic's AI hacking claims divide experts Posts warned that the platforms' AI tool rollouts make most private information available for tech company harvesting .


Open AI breaks ranks with Tech Council of Australia over heated copyright issue

The Guardian

Chief global affairs officer of company behind ChatGPT tells Sydney audience'we are going to be in Australia, one way or the other' Fri 17 Oct 2025 03.33 EDTLast modified on Fri 17 Oct 2025 03.35 EDT "No we are going to be in Australia, one way or the other." And now the internet claims many people don't even care. What is going on?! | First Dog on the Moon "We will engage in either country - we will find ways to work with those who want to build up big frontier models and have robust ecosystems, or those who just want to have much more narrowly defined AI," he said. "We will work with them under either scenario, regardless." "This is the nature of how technology works. Innovations come along, and then societies adapt to those innovations," he said.


Arts and media groups demand Labor take a stand against 'rampant theft' of Australian content to train AI

The Guardian

Arts, creative and media groups have demanded the government rule out allowing big tech companies to take Australian content to train their artificial intelligence models, with concerns such a shift would "sell out" Australian workers and lead to "rampant theft" of intellectual property. "It is not appropriate for big tech to steal the work of Australian artists, musicians, creators, news media, journalism, and use it for their own ends without paying for it," Ley said on Wednesday. In an interim report on "harnessing data and digital technology", the Productivity Commission set out proposals for how tech, including AI, could be regulated and treated in Australia, suggesting it could boost productivity by between 0.5% and 13% over the next decade, adding up to 116bn to Australia's GDP. The commission suggested several possible remedies, including expanding licensing schemes, or an exemption for "text and data mining" and expanding the existing fair dealing rules, which it said existed in other countries. The latter suggestion prompted fierce pushback from arts, creative and media companies, which raised alarm their work could be left open for massively wealthy tech companies to use – without compensation or payment – to train AI models.


Apple will use its street view Maps photos to train AI models

Engadget

Apple plans to start using images it collects for Maps to train its AI models. In a disclosure spotted by 9to5Mac, the company said starting this month it would use images it captures to provide its Look Around feature for the additional purpose of training some of its generative AI models. Look Around is Apple's answer to Google Street View. The company originally released the feature alongside its 2019 revamp of Apple Maps. The tool allows users to see locations from ground level.


Zuckerberg approved Meta's use of 'pirated' books to train AI models, authors claim

The Guardian

Citing internal Meta communications, the filing claims that the social network company's chief executive backed the use of the LibGen dataset, a vast online archive of books, despite warnings within the company's AI executive team that it is a dataset "we know to be pirated". The internal message says that using a database containing pirated material could weaken the Facebook and Instagram owner's negotiations with regulators, according to the filing. "Media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, may undermine our negotiating position with regulators." The authors sued Meta in 2023, arguing that the social media company misused their books to train Llama, the large language model that powers its chatbots. The Library Genesis, or LibGen, dataset is a "shadow library" that originated in Russia and claims to contain millions of novels, nonfiction books and science magazine articles.


Ukraine collects vast war data trove to train AI models

The Japan Times

As the future of warfare pivots toward artificial intelligence, Ukraine is sitting on a valuable resource: millions of hours of footage from drones which can be used to train AI models to make decisions on the battlefield. AI has been deployed by both sides on the battlefield during Russia's invasion of Ukraine to identify targets, scanning images far quicker than a human can. Oleksandr Dmitriev, founder of OCHI, a nonprofit Ukrainian digital system which centralizes and analyses video feeds from over 15,000 drone crews working on the front lines, said his system had collected 2 million hours, or 228 years, of battlefield video from drones since 2022.


Thom Yorke and Julianne Moore join thousands of creatives in AI warning

The Guardian

Abba's Björn Ulvaeus, the actor Julianne Moore, the Radiohead singer Thom Yorke are among 10,500 signatories of a statement from the creative industries warning artificial intelligence companies that unlicensed use of their work is a "major, unjust threat" to artists' livelihoods. "The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted," reads the statement. Thousands of creative professionals from the worlds of literature, music, film, theatre and television have given their backing to the statement, with authors including Kazuo Ishiguro, Ann Patchett, and Kate Mosse, musicians including the Cure's Robert Smith as well as the composer Max Richter and actors including Kevin Bacon, Rosario Dawson and F Murray Abraham. The organiser of the letter, the British composer and former AI executive Ed Newton-Rex, said people who make a living from creative work are "very worried" about the situation. "There are three key resources that generative AI companies need to build AI models: people, compute, and data. They spend vast sums on the first two – sometimes a million dollars per engineer, and up to a billion dollars per model. But they expect to take the third – training data – for free," he said.


X updates its privacy policy to allow third parties to train AI models with its data

Engadget

X is updating its privacy policy with new language that allows it to provide users' data to third-party "collaborators" in order to train AI models. The new policy, which takes effect November 15, 2024, would seem to open the door to Reddit-like arrangements in which outside companies can pay to license data from X. The updated policy shared by X includes a new section titled "third-party collaborators." Depending on your settings, or if you decide to share your data, we may share or disclose your information with third parties. If you do not opt out, in some instances the recipients of the information may use it for their own independent purposes in addition to those stated in X's Privacy Policy, including, for example, to train their artificial intelligence models, whether generative or otherwise.


Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models

Engadget

Some of the world's largest tech companies trained their AI models on a dataset that included transcripts of more than 173,000 YouTube videos without permission, a new investigation from Proof News has found. The dataset, which was created by a nonprofit company called EleutherAI, contains transcripts of YouTube videos from more than 48,000 channels and was used by Apple, NVIDIA and Anthropic among other companies. The findings of the investigation spotlight AI's uncomfortable truth: the technology is largely built on the backs of data siphoned from creators without their consent or compensation. The dataset doesn't include any videos or images from YouTube, but contains video transcripts from the platform's biggest creators including Marques Brownlee and MrBeast, as well as large news publishers like The New York Times, the BBC, and ABC News. Subtitles from videos belonging to Engadget are also part of the dataset.


TikTok's AI efforts reportedly exploit loopholes to use premium Nvidia chips

Engadget

The US has banned companies like Nvidia from selling their most advanced AI chips to China since 2022. But if loopholes exist, profit-hungry corporations will find and exploit them. The Information published a bombshell report on Thursday detailing how Oracle allows TikTok owner ByteDance to rent Nvidia's most advanced chips to train AI models on US soil. ByteDance, which many US lawmakers believe has direct ties to the Chinese government, is reportedly renting US-based servers containing Nvidia's coveted H100 chips from US cloud computing company Oracle to train AI models. The practice, which runs against the spirit of the US government's chip regulations, is technically allowed because Oracle is merely renting out the chips on American soil, not selling them to companies in China.